Refactor availability-recovery strategies #1457

alindima · 2023-09-08T08:30:45Z

Refactors availability-recovery strategies to allow for easily adding new hotpaths and failover mechanisms.

The new interface allows for chaining multiple RecoveryStrategy-es together, to cleanly express the relationship between them and share state and code where neccessary/possible:

This was done in order to aid in implementing new hotpaths like systematic chunks recovery and fetching from approval checkers.

Thanks to this design, intermediate state can be shared between the strategies. For example, if the systematic chunks recovery retrieved less than the needed amount of chunks, pass them over to the next FetchChunks strategy, which will only need to recover the remaining number of chunks.

Draft example of how a systematic chunk recovery strategy would look: 667d870 (notice how easy it was to add and reuse code)

Note that this PR doesn't itself add any new strategy, it should fully preserve backwards compatiblity in terms of functionality. Follow-up PRs to add new strategies will come.

sandreim

Thanks @alindima . I like the direction this is going. There are a few things I think can be improved and also we should add some tests to cover usage of multiple strategies.

polkadot/node/network/availability-recovery/src/lib.rs

polkadot/node/network/availability-recovery/src/task.rs

sandreim · 2023-09-11T08:03:33Z

polkadot/node/network/availability-recovery/src/task.rs

+/// Intermediate/common data that must be passed between `RecoveryStrategy`s belonging to the
+/// same `RecoveryTask`.
+pub struct State {
+	/// Chunks received so far.


I think we can move the RecoveryParams here. These shouldn't change across strategies.

we could, but I purposefully only placed mutable data into the State.
as you said, params don't change across strategies, so we only borrow them immutably. That's why I left them separate.
They are still shared across strategies, but by virtue of the RecoveryTask struct

polkadot/node/network/availability-recovery/src/task.rs

polkadot/node/network/availability-recovery/src/lib.rs

…rategies

alindima · 2023-09-12T11:52:25Z

Thanks @alindima . I like the direction this is going.

Thanks!

There are a few things I think can be improved and also we should add some tests to cover usage of multiple strategies.

In regards to tests:
This PR shouldn't change any of the logic concerning availability-recovery. It's meant to be preserve functionality while allowing to add more hotpaths and optimisations later on (which will come accompanied by a lot of tests).

I saw that there are arguably a good amount of tests that already exercise the multiple recovery paths.

Sounds good for now?

…vailability-recovery-strategies

ordian

looks reasonable

polkadot/node/network/availability-recovery/src/lib.rs

ordian · 2023-09-16T09:12:37Z

polkadot/node/network/availability-recovery/src/task.rs

@@ -222,7 +222,7 @@ impl State {
 		sender
 			.send_message(NetworkBridgeTxMessage::SendRequests(
 				requests,
-				IfDisconnected::TryConnect,
+				IfDisconnected::ImmediateError,


See paritytech/polkadot#6081

thanks! according to what I read on the issue, the cumulus 0002-pov_recovery test should be failing with the current code of the PR, right? but it's not

I'll roll back to using TryConnect just to be safe in this PR.
I'll think about the optimisations we can do in upcoming PRs, because I'd rather have an incremental approach and keep this refactoring backwards-compatible. @eskimor @sandreim sounds good?

If so, can we have this PR merged if it looks good?

yes, sounds good. The fact that test is passing with ImmediateError is concerning, please open an issue - this needs further investigation cc @skunert

I investigated this a bit, to me it looks like zombienet omits some of the args we pass to the collator. In the logs, I see no trace of the --reserved-only flag that we pass in the toml. paritytech/zombienet#1360

Created #1647 for tracking purposes

sandreim

Nice work @alindima

polkadot/node/network/availability-recovery/src/lib.rs

polkadot/node/network/availability-recovery/src/task.rs

…vailability-recovery-strategies

Draft of RecoveryStrategy based on linked enums

051328a

alindima added A0-needs_burnin Pull request needs to be tested on a live validator node before merge. DevOps is notified via matrix T0-node This PR/Issue is related to the topic “node”. I4-refactor Code needs refactoring. labels Sep 8, 2023

alindima added 3 commits September 8, 2023 11:32

add copright header

166aae5

fix clippy

518d8fd

Refactor RecoveryStrategy using dynamic dispatch

dae34a5

alindima changed the title ~~RFC: Refactor availability-recovery strategies~~ Refactor availability-recovery strategies Sep 8, 2023

sandreim reviewed Sep 11, 2023

View reviewed changes

alindima added 2 commits September 12, 2023 14:47

address comments

640c2d5

Merge branch 'master' into alindima/refactor-availability-recovery-st…

ac18bab

…rategies

Merge remote-tracking branch 'origin/master' into alindima/refactor-a…

2a5d6d5

…vailability-recovery-strategies

ordian approved these changes Sep 14, 2023

View reviewed changes

polkadot/node/network/availability-recovery/src/lib.rs Outdated Show resolved Hide resolved

polkadot/node/network/availability-recovery/src/lib.rs Outdated Show resolved Hide resolved

alindima added 2 commits September 15, 2023 12:24

don't use the backing group if chunk size query failed

bf39ba0

add ImmediateError to chunks recovery strategy

34408b5

ordian reviewed Sep 16, 2023

View reviewed changes

alindima added 3 commits September 18, 2023 10:41

more review comments

1569e49

fix test

d5c32d1

rollback to using TryConnect for fetch chunks

cb42bef

sandreim approved these changes Sep 18, 2023

View reviewed changes

polkadot/node/network/availability-recovery/src/lib.rs Outdated Show resolved Hide resolved

polkadot/node/network/availability-recovery/src/task.rs Outdated Show resolved Hide resolved

alindima added 3 commits September 19, 2023 09:52

replace BypassAvStore variant with a separate flag

b82b2a0

move requesting_chunks to the strategy

8585628

Merge remote-tracking branch 'origin/master' into alindima/refactor-a…

2f9b767

…vailability-recovery-strategies

This was referenced Sep 20, 2023

Add availability-recovery from systematic chunks #1644

Merged

Cumulus pov_recovery zombienet test is not failing when doing recovery with IfDisconnected::ImmediateError #1647

Closed

alindima merged commit 6f00edb into master Sep 20, 2023
11 checks passed

alindima deleted the alindima/refactor-availability-recovery-strategies branch September 20, 2023 12:56

kiltbot mentioned this pull request Oct 19, 2023

[AUTOMATIC] Update Polkadot dependencies from 1.1.0 to 1.2.0 KILTprotocol/kilt-node#571

Closed

ahmadkaouk mentioned this pull request Nov 16, 2023

Update polkadot-sdk from v.1.1.0 to v1.3.0 moonbeam-foundation/moonbeam#2565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor availability-recovery strategies #1457

Refactor availability-recovery strategies #1457

alindima commented Sep 8, 2023 •

edited

Loading

sandreim left a comment

sandreim Sep 11, 2023

alindima Sep 12, 2023

alindima commented Sep 12, 2023

ordian left a comment

ordian Sep 16, 2023

alindima Sep 18, 2023

ordian Sep 19, 2023

skunert Sep 20, 2023 •

edited

Loading

alindima Sep 20, 2023

sandreim left a comment

Refactor availability-recovery strategies #1457

Refactor availability-recovery strategies #1457

Conversation

alindima commented Sep 8, 2023 • edited Loading

sandreim left a comment

Choose a reason for hiding this comment

sandreim Sep 11, 2023

Choose a reason for hiding this comment

alindima Sep 12, 2023

Choose a reason for hiding this comment

alindima commented Sep 12, 2023

ordian left a comment

Choose a reason for hiding this comment

ordian Sep 16, 2023

Choose a reason for hiding this comment

alindima Sep 18, 2023

Choose a reason for hiding this comment

ordian Sep 19, 2023

Choose a reason for hiding this comment

skunert Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

alindima Sep 20, 2023

Choose a reason for hiding this comment

sandreim left a comment

Choose a reason for hiding this comment

alindima commented Sep 8, 2023 •

edited

Loading

skunert Sep 20, 2023 •

edited

Loading